Integrated computer-aided drug design and biophysical simulation approaches to determine natural anti-bacterial compounds for Acinetobacter baumannii

Acinetobacter baumannii is a nosocomial bacterial pathogen and is responsible for a wide range of diseases including pneumonia, necrotizing fasciitis, meningitis, and sepsis. The enzyme 5-enolpyruvylshikimate-3-phosphate (EPSP) synthase (encoded by aroA gene) in ESKAPE pathogens catalyzes the sixth step of shikimate pathway. The shikimate pathway is an attractive drug targets pathway as it is present in bacteria but absent in humans. As EPSP is essential for the A. baumannii growth and needed during the infection process, therefore it was used as a drug target herein for high-throughput screening of a comprehensive marine natural products database (CMNPD). The objective was to identify natural molecules that fit best at the substrate binding pocket of the enzyme and interact with functionally critical residues. Comparative assessment of the docking scores allowed selection of three compounds namely CMNPD31561, CMNPD28986, and CMNPD28985 as best binding molecules. The molecules established a balanced network of hydrophobic and hydrophilic interactions, and the binding pose remained in equilibrium throughout the length of molecular simulation time. Radial distribution function (RDF) analysis projected key residues from enzyme active pocket which actively engaged the inhibitors. Further validation is performed through binding free energies estimation that affirms very low delta energy of <−22 kcal/mol in MM-GBSA method and <−12 kcal/mol in MM-PBSA method. Lastly, the most important active site residues were mutated and their ligand binding potential was re-investigated. The molecules also possess good druglike properties and better pharmacokinetics. Together, these findings suggest the potential biological potency of the leads and thus can be used by experimentalists in vivo and in vitro studies.


Materials and methods
The flow chart as shown in Fig. 1 summarizes the study methodology. Potential drug molecules were filtered and subsequently used in different applications of computer aided drug designing (CADD) 17,34,35 to investigate their potency to block biological functionality of EPSP synthase enzyme.
Retrieval EPSP synthase crystal structure and preparation for docking studies. Crystal structure of EPSP synthase enzyme was retrieved from Protein Data Bank (PDB) 36 . The structure is determined at a resolution of 2.37 Å, R-value of 0.186, and submitted under the PDB ID of 5BUF 25 . To prepare the enzyme structure for molecular docking, extra chain, water molecules, and co-crystallized ligands were removed in UCSF Chimera version 1.15 37 . The structure energy was then minimized through 1000 steepest descent and conjugate gradient algorithms keeping step size value to 0.02 Å. AMBER ff14SB 38 was utilized as a force field to parameterize enzyme backbone and side chains residues. To assess the general quality of the minimized structure, PDBSum Ramachandran plot 39 was generated and compared with the unminimized. The superimposed energy minimized EPSP enzyme over experimental EPSP enzyme is provided as Fig. S1. CMNPD preparation. To identify natural inhibitors against EPSP synthase enzyme, CMNPD database (https:// www. cmnpd. org/) 40 was considered. This manually curated database is freely accessible and considered a promising source of valuable novel leads collected from marine organisms. The CMNPD database comprises approximately 47,000 molecules of bacterial, fungal, and algae origin. The database was retrieved as .sdf and then consequently filtered in an online FAFDrugs4 server 41 where druglike soft, toxicophores, and Eli Lilly MedChem rules were applied in step-wise fashion to eliminate non drug-like, toxic 42 and PAINS compounds 43 , respectively. Afterward, the filtered compounds were imported to PyRx 0.8 44 , energy minimized, and converted to .pdbqt making the compounds ready to be used in molecular docking studies.
Site directed virtual screening (SDVS). SDVS of the filtered library against EPSP synthase enzyme was performed using two popular docking softwares; Genetic Optimization for Ligand Docking (GOLD) 5.2 45 and PyRx AutoDock Vina 44,46 to cross validate the docking prediction and select the best binding molecules based on consensus docking scores. In both docking methods, rigid docking approach was utilized. The binding spot was selected by centering grid box at Ser238 OG atom (X-axis: −0.544 Å, Y-axis: 28.321 Å, and Z-axis: −12.451 Å) with 15 Å dimensions. For each molecule, 100 iterations were produced and assigned with Vina docking score in AutoDock Vina and GOLD fitness score in GOLD. The complexes were compared and frequent three hits were selected based on highest GOLD fitness score and lowest Vina docking score in kcal/mol. The protein with dock box is provided as Fig. S2. The docking protocol was validated by docking N-[phosphomethyl]glycine (glyphosate) inhibitor to the enzyme active pocket and docking score and binding energy value were calculated. The best docked molecules complex were retrieved in PDB and visualized in UCSF Chimera 1.15 37 50 was used for compounds processing. The complexes were placed into TIP3P water box (12 Å in size) which provided a neutral environment as it contains an appropriate number of counterions. The TIP3P water model offers better performance in calculating specific heats compared to other water models 51 . Hydrogen atoms, solvation box, carbon alpha atoms and all non-heavy atoms of the complexes were energy minimized for 1000 steps. Each system temperature was increased gradually to 310 K for 20 ps in Langevin dynamics 52 . This was followed by equilibration for 100 ps and production run of 100 ns. AMBER CPPTRAJ module 53 was employed to examine complexes stability by plotting different structural deviations analysis versus time. Visual molecular dynamics (VMD) software 54 was used to plot hydrogen bonds formed in each frame of MD simulation trajectories between enzyme and inhibitor.
Pair correlation function -g (r). Pair correlation function (PCF), a radial distribution function, is a highly significant parameter in MD simulations to compute the average interaction density distribution of ligand atom(s) around specific receptor atom(s) 55 . PCF plots were generated for hydrogen bonds between EPSP active pocket residues key to the binding of inhibitors. The PCF analysis was executed using CPPTRAJ and can be presented as, where, ρij is the density of a given receptor atom at distance "r" of the ligand atom. The g (r) functions as a ratio between observed interaction density "ρij" at distance "r" and density of solvent bulk atom "ρj". This ratio is equivalent to ratio between nij(r) and < ρj > 4π r 2 δr. Nij represents the number of bin atoms in spherical volume fragment depending on their width δr. The 4πr2δr is the spherical shell volume having thickness "δr" and at distance "r" from a reference solute atom. g(r) = ρij(r) < ρj > = nij(r) < ρj > 4πr 2 δr www.nature.com/scientificreports/ MM-PB/GBSA studies. MMPBSA.py package 56 of AMBER20 was employed to estimate binding free energies of the systems 27,32 . The main purpose of this analysis was to find out the difference of free energy between two states of the complex i.e. solvated and gas phase using the below equation, From complete simulation trajectories, 100 frames were used as input in both MM/PBSA and MM/GBSA. The selection of 100 frames was done using an input parameter file of AMBER20 MM-GB/PBSA which allowed considering 100 frames from simulation trajectories picked at an equal time interval. The dielectric constant used in MMPBSA.py was 1. Calculation of entropy energy contribution to each complex binding free energy was done using a bash script given by Duan et al. 57 . The total binding energy of MM-GBSA was further decomposed into the net energy contribution of each enzyme residue that is involved in inhibitor interactions.
Cross-validation of binding energy by WaterSwap. A more sophisticated approach called WaterSwap 58,59 was run further to cross-validate MM-PB/GBSA binding free energies. WaterSwap method uses three algorithms Thermodynamic Integration (TI), Free Energy perturbation (FEP), and Bennett's acceptance ratio (BAR) methods to compute system binding energy for default 1000 iterations. The difference in average binding energy value of each of the above methods for the systems is < 1 kcal/mol which demonstrates the systems well converged.
Alanine scanning analysis. Specific residues involved in consistent interactions and stability of complexes were selected for alanine scanning analysis, which was performed using AMBER20 60 . Functional significant residues were targeted based on docking interactions and residue-wise decomposition of MM-GBSA binding free energy. The residues were manually replaced with ALA coordinates and the structures were loaded into LEAP module of AMBER 61 . The initial parameter and coordinate files were generated for a short 20 ns of simulation and subsequent analysis of MM-GBSA was performed. The goal was to look for fluctuations in binding free energies as a result of mutation of the mentioned residues. Details of the alanine scanning methods used in this study can be found in a study carried out by Asma et al. 61 .
Pharmacokinetics studies. Physicochemical properties, medicinal chemistry, druglikeness, and Absorption, Distribution, Metabolism, Excretion and Toxicity (ADMET) analysis of the shortlisted compounds were performed through online servers i.e. pkCSM 62 and SwissADME 63 .

Results and discussion
Selecting a high quality protein structure. To EPSP enzyme structure steric clashes were removed by energy minimization process. These steric clashes are high-energy conformations that can trigger physical perturbation and complex instability during simulation 64,65 . However, such minimization may introduce bad contacts in the structure, affecting the overall enzyme structure. As a result, before and after minimization evaluation of the enzyme is critical in determining energy optimized structure for consequent docking and simulation to make the most accurate predictions possible. Pre-minimized of the EPSP Ramachandran plot had 87.21%residues in the favored region, while the energy minimized enzyme Ramachandran plot had 90.7% residues in the favored region. Furthermore, 1% of residues and 0% of residues were plotted in disallowed regions of the Ramachandran plot of pre-minimized and minimized structures, respectively. The energy minimized EPSP Ramachandran plot is presented in Fig. 2.
Unveiling EPSP synthase inhibitory molecules. SDVS yielded three potential and promising inhibitory compounds: Top-1 (CMNPD31561), Top-2 (CMNPD28986), and Top-3 (CMNPD28985) depicting Vina docking score of −8.1 kcal/mol, −7.9 kcal/mol, and −7.7 kcal/mol, respectively. The GOLD fitness score of the compounds is 72.54 for Top-1, 70.87 for Top-2, and 70.12 for Top-3. In contrast, the N-[phosphomethyl]glycine (glyphosate) inhibitor binding energy and GOLD fitness score were −6.3 kcal/mol and 69.23, respectively. Selection of the compounds was based on interactions at the binding site of EPSP and plausible suitable binding pose. Balance interactions of both hydrophilic and hydrophobic nature were witnessed between the compound's chemical moieties and several amino acid residues of the enzyme active pocket. All the three compounds were revealed to be docked at the hinge interface of enzyme and formed close distance contacts with residues Lys23, Glu49, Arg197, Ser238, Ser239, Arg266, Asp319, Lys346, His398, Arg399, and Ser426 (Fig. 3).
Top-1 upon binding produced several key hydrogen bonding with Arg197, Ser238, His398, and Arg399. Top-2 on the other hand, shared the same binding residue Arg399 as visualized in Top-1 but additionally formed hydrogen bonding with Asp236 and Ser239. Top-3 interacts with two residues (Glu49 and Arg197) of the active pocket. Besides these good number of hydrogen bonding, each compound interacts hydrophobically with many residues of the enzyme active site which significantly contribute to the overall compounds stable binding mode at the enzyme active pocket. The chemical interactions of the compounds with enzyme active pocket are given in Fig. 4. 66 experiment was performed to further evaluate the conformational strength of hit molecules with EPSP for 100 ns. The conformational stability of complexes was investigated first by root mean square deviations (RMSD) 67 that measures all carbon alpha atoms deviations considering the docked conformation of the complexes as a reference (Fig. 5A) 68 was determined for the systems that describe fluctuations of protein residues from the original position during simulation. In this analysis, residues of the protein active key in binding inhibitor were also elucidated. As can be analyzed from Fig. 5B, residues of the enzyme are subject to continuous dynamics however, the fluctuations are within acceptable range (~ 3 Å) and contributing to good stable enzyme conformation in the presence of compounds. These fluctuations are the outcome as discussed above due to the larger size of the enzyme and dynamically more active domains 69,70 . The continuous motions of the enzyme domains are natural for enzyme functionality but are not affecting compounds binding and overall chemical interactions network 31 71 was performed next to explain compactness and relaxation of the enzyme structure during simulation (Fig. 5C) . RoG also investigated some minor structural variations that are because of the enzyme loops, which are naturally flexible however, these alterations do not alter stability to the compounds at the docked pocket. Lastly, the number of hydrogen bonds the compounds made to the enzyme active pocket residues were evaluated 72 (Fig. 5D). All the compounds formed a good number of hydrogen bonds with enzyme active pocket residues that are consistent and of close distance suggesting higher intermolecular affinity of the complexes. For comparative analysis, the N-[phosphomethyl]glycine (glyphosate) control was also used in molecular dynamics simulation. The RMSD of control was seen more stable as like Top-2 ad Top-3 despite of some initial   Fig. 6A, B, respectively.

Inter molecular interactions RDF analysis.
RDF is a pair correlation function that estimates the density distribution of interacting radii significant in holding the ligand at the docked site 55,73 . RDF plots for key hydrogen bonding residues were generated to get an insight into the interatomic association of said interactions during simulation period (Fig. 7). For Top-1, two interactions engaging Ser238 and Arg399 residues were plotted that were revealed to be consistent in terms of density distribution. Two maximum density distribution points were observed for Top-1 oxygen atom and Arg399; first at 1.48 Å with g(r) of 0.51, followed by second distribution at 1.62 Å with g(r) of 0.50. For the second interaction of Top-1, Ser238 is attached to the compound with maximum interatomic distribution of 0.28 at a distance of 1.36 Å. It can be concluded from both interactions plot that said interactions are vital for keeping the ligand in close vicinity of the active pocket. Top-2 interaction with Ser239 played less contribution with variable g(r) values at different distances, though the interaction seems to play a critical role in compound recognition and binding. Interaction with Arg399 is more stable and density distribution is maximum at 2.16 Å with g(r) value of 0.40. Top-3 compound contact with Arg197 is more uniform at a distance of 2.6 with g(r) score of 0.28. All these plots suggest high affinity binding of the compounds to enzyme and formation of strong stable complexes. As interactions distance patterns between the compounds and enzyme active site residues are not much affected during simulation time, it can be inferred that binding mode of the compounds is not changed at the enzyme active pocket. As a result of which the interactions network of the compounds with respect to the enzyme remain consistent and in close distance based on which higher affinity of the compounds for the enzyme can be interpreted.
Binding free energies estimation. To validate the affinity of compounds to EPSP, post simulation processing of MM-PB/GBSA was performed to yield different free energies of the complexes. MM-PBSA and MM-GBSA are considered reliable for this purpose as they are more accurate compared to docking predictions 27 . The MM-PBSA is computationally expensive than MM-GBSA but more reliable and accurate. Both methods were used to cross validate the findings. Table 1    The values indicate that entropy contribution is highly unfavorable to systems stability. However, as the net binding energy of the filtered compounds systems is very good, the entropy energy does not impact intermolecular binding significantly.
Hotspot residues energy contribution. Further investigation of the hotspot residues from the enzyme active pocket was done to understand their contribution to the total MM-GBSA binding energy. This was accomplished by decomposing the net energy into residues of the enzyme. Residues with the binding free energy of ≤ 1 kcal/mol were termed hotspot as they are significantly involved in interactions with the compounds. Residues like His398, Arg399, Ser238, Asp319, Arg197, Arg266, Glu342, Lys23, Asp236, Ser426, Leu343, Ile318, Lys346, Arg350, Gly21, Asp22, Ser239, Leu46, Lys23, Glu49, and Arg197 all contribute favourably to the binding of compounds. The binding energy of each of these residues is presented in Fig. 8.
WaterSwap based binding free energy calculation. The MM-PBSA and its counterpart MM-GBSA calculate the Gibbs free energy based on snapshot selected at regular intervals from simulation trajectories. As there is variation in the ligand conformation during simulation, it's very difficult to predict which part of the ligand contributes significantly to the net binding energy 58 . Also, the use of an implicit model in these methods skips the role of water molecules in bridging the ligand and protein 59 . Such limitations can easily be overcome in WaterSwap. As can be seen in Fig. 9 the complexes binding energy is well converged (the difference in the bind- Alanine scanning analysis. The active residues of the enzyme that contribute highly to the net binding free energy and are involved in robust interaction with the inhibitor were mutated to decipher their functional significance. In specific, four residues: Arg197, Ser238, Ser239, and Arg399 were mutated to alanine to bring native structural changes in the enzyme but do not affect the overall conformation of the enzyme. By doing so, we observed a decline in the contribution of these residues as tabulated in Table 2. Druglikeness, medicinal chemistry, pharmacokinetics and toxicity analysis of the compounds. The loss of drugs owing to poor pharmacokinetics in drug development procedures leads to higher development expenses. Screening of promising drugs has been greatly improved by the availability of in silico pharmacokinetic tools 74 . Therefore, a detailed pharmacokinetic analysis of the top hit molecules was done to assist chemists to optimize the structure while maintaining an acceptable range of pharmacokinetics. Table 3 provides a detailed description of the pharmacokinetics of the screened compounds along with druglikeness, medicinal chemistry, and several toxicity analysis. According to the rules of medicinal chemistry, drug absorption is of prime importance and should be assessed in the in silico pharmacokinetics studies first 75 43 . The zero alert for PAINS demonstrates the compounds to selectively bind to the EPSP. Screened molecules are predicted to have poor permeability for blood brain barrier (BBB) 81 and are unable to cross the central nervous system. The molecules are non-inhibitors of cytochrome P450 allowing functional oxidation of xenebiotics 82 . The molecules are predicted to show no hepatotoxicity, skin sensitization, AMES, and carcino toxicity, all these parameters suggesting the molecules to be potential candidates subject to further experimental exploration.

Conclusions
Compounds of natural origin and their molecular frameworks play a significant contribution in the discovery of new drugs 40 . This can be evidenced by the approval of two-thirds of natural source small molecule drugs from January 1981 to September 2019 83 . In particular, natural compounds from oceans have immense potential to become good drug molecules because of extreme biodiversity marine organism secondary metabolites 84 .

Data availability
All the data is available within the manuscript and supplementary material.